event stream
EventMG: Efficient Multilevel Mamba-Graph Learning for Spatiotemporal Event Representation
Event cameras offer unique advantages in scenarios involving high speed, low light, and high dynamic range, yet their asynchronous and sparse nature poses significant challenges to efficient spatiotemporal representation learning. Specifically, despite notable progress in the field, effectively modeling the full spatiotemporal context, selectively attending to salient dynamic regions, and robustly adapting to the variable density and dynamic nature of event data remain key challenges. Motivated by these challenges, this paper proposes EventMG, a lightweight, efficient, multilevel Mamba-Graph architecture designed for learning high-quality spatiotemporal event representations. EventMG employs a multilevel approach, jointly modeling information at the micro (single event) and macro (event cluster) levels to comprehensively capture the multi-scale characteristics of event data. At the micro-level, it focuses on spatiotemporal details, employing State Space Model (SSM) based Mamba, to precisely capture long-range dependencies among numerous event nodes. Concurrently, at the macro-level, Component Graphs are introduced to efficiently encode the local semantics and global topology of dense event regions. Furthermore, to better accommodate the dynamic and sparse characteristics of data, we propose the Spatiotemporal-aware Event Scanning Technology (SEST), integrating the Adaptive Perturbation Network (APN) and Multidirectional Scanning Module (MSM), which substantially enhances the model's ability to perceive and focus on key spatiotemporal patterns. By employing this novel collaborative paradigm, EventMG demonstrates the ability to effectively capture multi-level spatiotemporal characteristics of event data while maintaining a low parameter count and linear computational complexity, suggesting a promising direction for event representation learning.
GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation
Existing event datasets are often synthesized from dense RGB videos, which typically lack viewpoint diversity and geometric consistency, or depend on expensive, difficult-to-scale hardware setups. GS2E overcomes these limitations by first reconstructing photorealistic static scenes using 3DGaussian Splatting, and subsequently employing a novel, physically-informed event simulation pipeline.
FLAME: Fast Long-context Adaptive Memory for Event-based Vision
We propose Fast Long-context Adaptive Memory for Event (FLAME), a novel scalable architecture that combines neuro-inspired feature extraction with robust structured sequence modeling to efficiently process asynchronous and sparse event camera data. As a departure from conventional input encoding methods, FLAME presents Event Attention Layer, a novel feature extractor that leverages neuromorphic dynamics (Leaky Integrate-and-Fire (LIF)) to directly capture multi-timescale features from event streams. The feature extractor integrates with a structured state-space model with a novel Event-Aware HiPPO (EA-HiPPO) mechanism that dynamically adapts memory retention based on inter-event intervals to understand relationship across varying temporal scales and event sequences. ANormal Plus Low Rank (NPLR) decomposition reduces the computational complexity of state update from O(N2) to O(Nr), where N represents the dimension of the core state vector and r is the rank of a low-rank component (with r N). FLAME demonstrates state-of-the-art accuracy for event-by-event processing on complex event camera datasets.
AThe
For what purpose was the dataset created? Was there a specific task in mind? Was there a specific gap that needed to be filled? As surveillance cameras become prevalent in public spaces, using them has proven effective in proactively deterring and preventing such incidents. However, the data collected by these cameras could potentially lead to breaches in privacy for those being filmed. Thus, we hope to find a way to capture scenes of violence while avoiding infringement on personal privacy. DVS cameras can naturally achieve this goal by capturing events of pixel brightness changes. Existing violence detection datasets are filmed with RGB cameras, which cannot ensure privacy preserving.
Continuous Spatiotemporal Events Decoupling through Spike-based Bayesian Computation
Numerous studies have demonstrated that the cognitive processes of the human brain can be modeled using the Bayesian theorem for probabilistic inference of the external world. Spiking neural networks (SNNs), capable of performing Bayesian computation with greater physiological interpretability, offer a novel approach to distributed information processing in the cortex. However, applying these models to real-world scenarios to harness the advantages of brain-like computation remains a challenge. Recently, bio-inspired sensors with high dynamic range and ultra-high temporal resolution have been widely used in extreme vision scenarios. Event streams, generated by various types of motion, represent spatiotemporal data.
EV-Eye: Rethinking High-frequency Eye Tracking through the Lenses of Event Cameras
In this paper, we present EV-Eye, a first-of-its-kind large-scale multimodal eye tracking dataset aimed at inspiring research on high-frequency eye/gaze tracking. EV -Eye utilizes the emerging bio-inspired event camera to capture independent pixel-level intensity changes induced by eye movements, achieving sub-microsecond latency.